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APPARATUS AND METHOD FOR 
COLLABORATIVE DYNAMIC VIDEO ANNOTATION 



5 FIELD OF THE INVENTION 

The present invention relates to the field of collaborative video annotation, and 
more specifically, to apparatus for enabling multiple users to share their views about a 
video content. 

10 BACKGROUND OF THE INVENTION 

Reference is made to a patent application entitled Method and Apparatus for 
Creating Dynamic Object Markers in a Video Clip being filed on even date herewith, 
and assigned to the same assignee as the present application, and whereof the disclosure 
is herein incorporated by reference to the extent it is not incompatible with the present 

1 5 application. 

A situation can arise wherein two or more users wish to communicate in 
reference to a common object, for example, in reference to a video. An example of this 
could be where a soccer team coach wishes to consult with a colleague to seek advice. 
The soccer team coach might wish to show a taped video of a game and ask the 

20 colleague to explain, using the video, why one team failed to score in a given attack 
situation. In addition, the coach might wish to record this discussion and show it later 
to other coaches to get more opinions. 

In another scenario, a student could be taking a training course being given at a 
remote location from where a course instructor is located. It may be that the student 

25 cannot understand a procedure being taught in the course. The student can then call the 
instructor over the Internet phone to find out how such a procedure should be 
performed. The instructor can first browse through the training video together with the 
student to find the clip where the difficulty can be identified. The student may then ask 
various questions of the instructor about that procedure. For example, the instructor 

30 may then decide to show the student another video, which offers more detailed 
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information. The instructor may then annotate this video using collaborative video 
annotation tools to explain to the student how this procedure should be performed. 

A need exists for systems and products to provide services such as those 
described above. One such product is the Sprint's Drums system which allows two 
5 users to view video simultaneously by using the Shared Movie Player that runs on 
Silicon Graphics, Inc. computers. The shared video playback starts with one of the 
users sending the video file in SGI Movie Player format to be shared with the other 
user. Once the complete video has been transferred, either of the two users can 
initiate video playback. The playback control is also shared. Either of the two users 
10 can pause the video, jump to a random position in the video by use of a scrollbar, 
or playback video in reverse direction. 

However, the Shared Movie Player generally does not provide certain 
features such as graphical annotation on top of the video window. In order to add 
graphical annotations, the user will have to pause the video, copy and paste the 
15 current frame to an ordinary shared whiteboard application. 

A Tele-Pointer, which is a device for controlling the appearance and 
position of a cursor or pointer on computer displays from a remote location, is 
also typically not provided; the video window itself is not shared and the users do 
not have any means for sharing a pointing device in either play or pause mode. 
20 Generally an integrated audio conferencing mixing conference audio with 

video sound-track is not provided: a regular telephone connection is typically used 
for user-to-user dialogues. 

In such system, recording/playback of shared playback session is typically 
not provided nor is Multi-user conferencing: the Shared Movie Player only works 
25 for point-to-point conferencing. 

Yet another product, Creative Partner from eMotion[2,3], contains three video 
annotation tools, but annotation has to be recorded off-line, there being no on-line 
collaboration support and during a period when the video playback is paused. These 
three annotation tools include one for graphical annotation, one for text annotation, and 
30 one for audio annotation. The Creative Partner video player allows the user to control 
video playback and to invoke one of the three annotation tools provided. Annotation 
can only be attached to the video in pause mode. The user selects the appropriate 
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annotation tool, points to a given image coordinate on the video frame to which the 
annotation is attached. The user will then be able to record the given annotation. The 
annotation is not related to a video segment, it being only related to the frame to which 
it was attached. During playback, the Creative Partner video player will pause at any 
5 video frame having annotations attached. The user will have to activate playback of 
audio annotations and to resume video playback. The annotations are removed from the 
video window, once video playback is resumed. 

Helpful background information can be found in U.S. Patent No. 5,600,775, 
issued February 4, 1997 in the names of King et al. and ENTITLED METHOD AND 
10 APPARATUS FOR ANNOTATING FULL MOTION VIDEO AND OTHER 
INDEXED DATA STRUCTURES, and at Internet site 
http://www.emotion.com/html/creativepartner product page.html. 

SUMMARY OF THE INVENTION 
15 It is herein recognized that there is a continuing need for an apparatus to 

provide the following functional features which relate to aspects of the present 
invention. 

On-line multi-point group discussions on video content over heterogeneous 
networks with Tele-Pointer support; 
20 synchronized video playback, overlaid with voice comments as well as 

dynamic graphical annotation during group discussion; 

dynamic adjustment of playback speed during synchronized video playback 
and recording of group discussion sessions; 

in response to an existing recorded annotation session, conduct subsequent 
25 on-line multi-point group discussions and annotate during synchronized playback 
of recorded annotations; and 

attachment of any tool to help browsing of video content and creation of 
dynamic markers for static as well as dynamic objects. 

In collaborative dynamic video annotation applications, it is generally 
30 considered unlikely that the participants in the group discussion will either own the 
same type of computer equipment or be physically be present in the same building. It is 
also considered in general unlikely that each participant can be required to have a 
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connection of equal data-rate to the Internet. 

It is herein recognized that a desirable solution to the above problems should 
provide enough flexibility to overcome problems due to the existence of heterogeneous 
environments. For example, a desirable solution, in accordance with an aspect of the 
5 present invention, can allow people to use a Public Switched Telecommunications 
Network (PSTN), if quality of audio is of concern, and the Internet Phone (IP), if cost is 
a greater concern. 

Tele-Pointers are an important part of a group discussion. Without them, it is 
practically impossible to know what each participant is pointing at. It is herein 

10 recognized that since video is a dynamic document, it is more helpful, in accordance 
with an aspect of the present invention, if each participant can make their own cursor 
visible on the screens of other participants. 

Since video content is difficult to describe verbally, it is herein recognized, in 
accordance with an aspect of the present invention, that it is important for all 

1 5 participants to see the same video frame at the same time. It is also required that any 
participant be able to annotate, such as with graphical drawings or texts, on top of a 
video frame independently of the state of the video player and for all participants to see 
the same annotations on each of their screen at the same time. In addition, since 
participants are usually located in respective remote locations, the ability to have full 

20 duplex multi-point voice communication is considered essential. The system should 
preferably also be able to mix the audio track in the video with the voice of all 
participants all the time. 

Sometimes, annotating simultaneously with voice and graphical drawings while 
the video is being played is not a very straightforward task. It is herein recognized that, 

25 in accordance with an aspect of the present invention, any participant should be able to 
dynamically adjust the shared video playback speed during the group discussion. The 
entire group discussion should preferably be able to be recorded and played back in the 
same sequence in which it happened. The recording should preferably include all VCR 
commands, graphical drawings/texts, and voice comments, time-stamped for later 

30 playback. 

Playback of a recorded annotation can occur in a stand-alone mode or an on- 
line collaboration mode. In the second case, the playback of the recorded annotation 
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should be synchronized among all participants, in accordance with an aspect of the 
present invention, any participant should be able to annotate while the playback is 
going on and record new annotations in a separate record. 

In accordance with an aspect of the present invention, during a collaborative 
5 dynamic video annotation, any participant is able to use add-on tools to facilitate the 
discussion. One such tool is a video browser which allows a user to jump to random 
points in the video. While the video is being played, proper places in the tool are 
preferably be highlighted to reflect the frame that is currently being played on the 
screen of each participant. Another tool that may be implemented is the tool that allows 

10 any participant to create dynamic object markers. A dynamic object marker is a 
graphical drawing (usually assembled from polygons) that highlights the location of an 
object of interest in a video frame. The dynamic marker indicates dynamic objects and 
dynamic parameters relating to a steady object. Since the location of a dynamic object 
generally changes from frame to frame, this tool is provided in accordance with the 

15 invention to help locate this object in all frames between a selected pair of video 
frames. This tool, when invoked, will create a marker for all frames between a selected 
pair of video frames. The system preferably also provides a tool that creates dynamic 
markers for steady objects. This is useful when the object of interest contains dynamic 
information parameters, for example, current flow in an electrically conductive wire. 

20 In accordance with the present invention, a computer based system or apparatus 

provides collaborative dynamic video annotation, recording of such a collaborative 
session, synchronized playback of such a recorded annotation, and annotation/recording 
during playback of a recorded annotation. The apparatus comprises a computer 
readable storage medium having a computer program stored thereon performing the 

25 steps of: (a) choosing a network service and starting or joining a conference; (b) loading 
a video or a recorded annotation file; (c) performing simultaneous graphical, text, and 
audio annotation, with the support of te!l-pointers, VCR controls, video browses, and 
dynamic marker creation tools; and (d) recording of a collaborative session. 

The system in accordance with the invention is a client/server model. The client 

30 essentially comprises a shared multi-media player with synchronized multi-point VCR 
control. The window that displays live video frames is also a drawing board where 
different people can place graphical objects from different clients at the same time. 
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In addition, the system provides multi-point frill duplex voice connection and the joint 
voice comments are mixed with the audio track in the video currently being played. 

The network service in Step (a) includes any of the Internet TCP/IP, IPX, 
Modem, and Serial connection. The video file described in Step (b) is located locally 
5 with the client. Step (c) can be executed independently of whether the video player is in 
the play or pause modes and the playback among multiple clients is loosely 
synchronized, as explained below, while the speed of the video playback can be 
adjusted dynamically during synchronized playback. Step (d) records all visible 
activities in the collaborative session, such as lines, texts, Tele-Pointers, markers, as 

1 0 well as voice comments exchanged during the session. 

It is important for any shared applications to serialize all events that occurred 
during a session. In order to implement a shared but synchronized multi-media player 
while giving all participants an equal access to the VCR control, the player action is 
delayed while the VCR button is first pressed. Instead of interpreting a VCR command 

15 immediately on a client machine, a VCR command is sent to the server, serialized 
sequentially, and sent back to all client machines. Only after having received merged 
VCR commands from the server does the player then take action. 

In order to synchronize video playback among all participants, some 
cooperation is required between all video players. This can be done on a frame-by- 

20 frame basis, which is very costly and difficult to realize in reality without sacrificing the 
playback quality. In accordance with the present invention, it is considered preferable 
to synchronize on a VCR command-by-command basis. The current frame at the time 
the VCR control button is first pressed is recorded and each traditional VCR command 
is converted into one of the two types of new VCR commands, namely "JUMP-PLAY 

25 frame-number frame-rate" and "JUMP-PAUSE frame-number 1 '. For example, suppose 
a PLAY button is pressed while the player is on frame 300. The message sent to the 
server and eventually received by all clients will be "JUMP-PLAY 300 1.0" instead of 
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"PLAY", that is, the VCR will seek to frame 300 and then start playing the video at 
normal speed. 

Another component of the present invention is the ability to mix audio signals 
and to overlay graphics with video frames. Some known audio/graphic hardware, such 
5 as Parallax board offers such a functionality. Even if a graphic overlay is supported by 
hardware, the system still needs to handle the drawing of Tele-Pointers and graphical 
annotations separately. This is because Tele-Pointers may occlude graphical 
annotations and graphical annotations need to be restored once Tele-Pointers move 
away from their current position. 

10 In order for the system to play a recorded annotation session synchronously on 

the screens of all participants, it is only necessary to record, with timestamp, all the 
messages that reached the server during the discussion session and send them to all 
clients according to the timestamp during playback. This will also allow each 
participant to annotate during playback of recorded annotation. 

15 In accordance with an aspect of the invention, a method for dynamic video 

annotation among a plurality of users at respective locations, utilizing programmable 
computer apparatus with information storage and retrieval capability, the method 
comprises the steps of: selecting a network service coupled to the computer; performing 
one of (a) starting and (b) joining a collaborative session among the users; loading one 

20 of (a) a video and (b) a recorded annotation file; performing annotation of at least one 
of graphical, text, and audio annotation; and storing the collaborative session. 

In accordance with another aspect of the invention a method for dynamic video 
annotation among a plurality of users, utilizing programmable computer apparatus with 
information storage and retrieval capability, the method comprises the steps of: 

25 selecting a network service coupled to the computer; joining a collaborative session 
among the users; loading one of (a) a video and (b) a recorded annotation file; 
performing annotation of at least one of graphical, text, and audio annotation; and 
storing the collaborative session. 

In accordance with another aspect of the invention apparatus for dynamic video 

30 annotation among a plurality of users; the apparatus comprises programmable computer 
apparatus with information storage and retrieval capability; a user interface coupled to 
the computer apparatus for performing selection of a network service, a user interface 
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coupled to the computer apparatus for performing one of (a) starting and (b) joining a 
collaborative session among the users; a user interface coupled to the computer 
apparatus for loading one of (a) a video and (b) a recorded annotation file; a user 
interface coupled to the computer apparatus for performing annotation of at least one of 
5 graphical, text, and audio annotation; and a user interface coupled to the computer 
apparatus for storing the collaborative session. 

In accordance with another aspect of the invention apparatus for enabling a 
plurality of users at respective locations to participate in a collaborative session 
regarding content of a video; to record such a collaborative session; to annotate/record 

1 0 during playback of a recorded session; and to play back synchronously such a recorded 
annotated session, wherein the apparatus comprises a shared video player/recorder 
function (VCR) available to each of the users, with multi-point VCR control exhibiting 
dynamic speed adjustment, and an ability to show dynamic markers; a function by 
which any of the users can play or stop the video; jump to a different location in the 

1 5 video; dynamically change video play speed; the shared video player/recorder function 
available to each of the users being synchronized at the same video frame whenever 
any VCR activity occurs; and apparatus for displaying a dynamic marker when a frame 
to which such a marker is attached is displayed. 

In accordance with another aspect of the invention, apparatus for enabling a 

20 plurality of users at respective locations to participate in a collaborative session 
regarding content of a video; to record such a collaborative session; to annotate/record 
during playback of a recorded session; and to play back synchronously such a recorded 
annotated session, wherein the apparatus comprises a shared video player/recorder 
function (VCR) exhibiting a window available to each of the users, with multi-point 

25 VCR control exhibiting dynamic speed adjustment, and an ability to show dynamic 
markers; a function by which any of the users can play or stop the video; jump to a 
different location in the video; dynamically change video play speed; the shared video 
player/recorder function available to each of the users being synchronized at the same 
video frame whenever any VCR activity occurs; apparatus for displaying a dynamic 

30 marker when a frame to which such a marker is attached is displayed; and the shared 
video player/recorder function window acting as a shared whiteboard with Tele-Pointer 
support for supporting free-hand drawing and text. 
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In accordance with another aspect of the invention a method for dynamic video 
annotation among a plurality of users utilizes a programmable computer and comprises 
the steps of: selecting a network service coupled to the computer; performing one of (a) 
starting and (b) joining a collaborative session among the users; loading one of (a) a 
5 video and (b) a recorded annotation file; performing annotation of at least one of 
graphical, text, and audio annotation; and storing the collaborative session. 

RRTFF DES CRIPTION OF THE DRAWINGS 
The invention will be better understood from the following detailed description 
1 0 in conjunction with the Drawing, in which 

Fig. 1 is a diagram illustrating a collaborative dynamic annotation session over 
Public Switched Telecommunications Network (PSTN) in accordance with the 
invention; 

Fig. 2 is a diagram illustrating a collaborative dynamic annotation session over 
1 5 Internet Phone (IP) in accordance with the invention; 

Fig. 3 is a diagram illustrating a front-end user interface in accordance with the 
invention; 

Fig. 4 is a diagram illustrating a main client user interface in accordance with 
the invention; 

20 Fig. 5 is a diagram illustrating a user interface in accordance with the invention 

for deleting dynamic markers by name; 

Fig. 6 is a diagram illustrating a video browser interface in accordance with the 
invention; 

Fig. 7 is a diagram illustrating a dynamic object marker creation tool interface 
25 in accordance with the invention; 

Fig. 8 is a diagram illustrating the system architecture in accordance with the 
invention; 

Fig. 9 is a diagram illustrating the system message flow in accordance with the 
invention; and 

30 Fig. 10 is a diagram illustrating an example that shows how event messages are 

communicated in accordance with the invention. 
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DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 
The invention features a shared "video player" with multi-point VCR control, 
dynamic speed adjustment, and the ability to show dynamic markers. It is emphasized 
however that the 'Video player" is contemplated in the present invention to be either (a) 
5 a simulated video player function provided by a computer with stored video in its 
memory and simulated video player controls or (b) an actual VCR operated in 
conjunction with the rest of the system. Accordingly, it is herein intended for 
simplicity and convenience to refer in the present patent application to a video player, 
video player/recorder, video recorder/player or VCR, by which is meant either a 

10 simulated or an actual VCR or video player/recorder, or as a VCR function or a video 
recorder/player function, according as a particular embodiment may be constituted. 

It is also understood that the apparatus may utilize a television receiver 
apparatus for the video display function, in association with a computer and a simulated 
or actual hardware to provide the functions herein disclosed. 

15 Fig.l shows an embodiment illustrating how the invention is utilized in 

conjunction with a public switched telephone network. A telephone switch 100 is 
coupled to a computer 130, which in turn is coupled by way of a local area network 
(LAN) to each of a client set-up, 102 and 104. Client set-upsl02 and 104 are each 
equipped with a microphone, 125, respectively. Telephone switch 100 is coupled to a 

20 collaborative dynamic video annotation server 120 by way of a computer modem 115. 
Telephone switch 100 is also coupled a modem 115. Telephone switch 100 is coupled 
to telephones 110 and a monitor. Further computers, monitors, speakers, and 
telephones similar to those shown in Fig. 1 may be coupled to the system as shown, 
although not illustrated in Fig. 1 . 

25 Telephone switch 100 is responsible for handling a telephone conference. In 

this setup, participants in the conference can also use an IP-PSTN (Internet phone - 
Public switched telephone network), bridge 130 to allow Intranet Phone users to use the 
public switched telephone network. The collaborative dynamic video annotation server 
120, in accordance with the invention, handles all system messages, such as those 

30 marked by reference numeral 1030 in Fig. 10, sent by clients over the Internet. Server 
120 also connects to the telephone conference switch 100 by a computer modem 115. 
There are two types of clients or participants. One utilizes a regular telephone 
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connection 1 10 and the other utilizes LAN coupled Intranet Phone connections. Both 
types of clients are equipped with a mouse 140, a keyboard 155, a pair of speakers 126, 
a monitor 160, and a main computer 108, coupled as shown in Fig. 1. For a computer 
equipped with an Intranet phone connection, a microphone 125 is required. 
5 An embodiment in Fig.2 shows the invention as utilized in an Internet Phone 

environment. In this setup, the 200, providing the functionalities herein described and 
coupled to the Internet, handles all event messages sent by clients over the Internet. 
This server also acts like a digital phone switch, mixing all or part of voice comments 
spoken by conference participants and broadcasting the mixed signals back to all 
10 clients. All clients are each equipped with a mouse 220, a keyboard 215, a pair of 
speakers 230, a microphone 225, a monitor 205, and a main computer 210 coupled to 
the Internet. 

Fig.3 illustrates an embodiment of a front-end interface in accordance with the 
invention, including optionally a telephone dialer button. In accordance with an 

1 5 embodiment of the invention, network service differences are hidden under a software 
layer through the use of Microsoft's DirectPlay or any software that implements the 
T.123 standard. 300 shows an interface that asks the user to select which network 
service to use. The front-end server side user interface is 310 which prompts the 
user/operator to select a service provider. The client side interface is 320, which selects 

20 a service provider and eventually the collaborative dynamic video annotation server. 

Fig.4 is shows an embodiment of a main client user interface in accordance 
with the invention. A user first selects a video file or a recorded annotation file to load 
using interface 400, while interface 410 gives brief summary instructions on its usage. 
Video frames are displayed in a window 498. A button 408 (stop), 412(play), 

25 416(pause), 420(fast forward), 424(fast rewind), 428(step forward), and 432 (step 
backward) provide the basic VCR control functions for playing the video. A slider 436 
shows the current video position and also provides a means for a user to randomly jump 
to any frame in the video. Text display 440 displays the current frame number over the 
total frame number comprised in the video. A user can dynamically adjust the play 

30 speed by moving a slider 444 or by pressing the key + , normally provided on the 
standard keyboard, for increasing current frame rate by 0. 1 frame/sec and the key - for 
decreasing current frame rate by 0. 1 frame/sec. The current frame rate is displayed in 
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448. 

In order to allow a user to control the playback of a recorded annotation, while 
permitting them to change the course of video playback, a second set of VCR control, 
452(record), 456(stop), 460(play), 464(pause), is provided on the main client user 
5 interface. These two sets of VCR controls are active simultaneously. However, when 
any of the first set of buttons (408, 412, 416, 420, 424, 428, 432) are pressed, the 
playback of the recorded annotation will automatically be stopped. Similarly, if a user 
starts playing a recorded annotation while the first set of VCR control is active, the 
system will also stop the current video playback activity. An exception is the recording 
10 activity, button 452, which will not be stopped even if the user presses any of the first 
set of buttons. 

In accordance with the invention, a user can make a free-hand drawing by 
holding down the left mouse button while moving the mouse cursor on the video frame 
window 498. The lines will be shown on all clients' video frames in the same color as 

15 selected by the system or by the user on a per-user basis. A user can also type a text 
string on the video frame window 498 by first right clicking the mouse button.. When 
this happens, the system will pop up a text edit window only on that particular client's 
screen. The user can then type any single or multiple line text strings and click on the 
OK button when ready. Only at that time, the typed string(s) will appear on all clients' 

20 video frames at the place where the right mouse click occurred. Note that during the 
typing process, the conference activities can continue. A user can also turn on their own 
Tele-Pointer to enable other participants to know where they are pointing the mouse. 
This is accomplished by pressing the button 480. In order not to confuse a user with too 
many graphical objects on screen, all graphical annotations, text annotations, and Tele- 

25 Pointers relating to the same user are drawn in the same color. 

The system also allows a user to erase the content of all graphical annotations 
by pressing button 472, the content of all text annotations by pressing button 468 in the 
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described embodiment, and the record of certain dynamic markers by pressing button 
476. 

Three attached tools are shown in the main client user interface in Fig. 4. They 
are a video browser 486, a dynamic marker creation tool for dynamic objects 490, a 
5 dynamic marker creation tool for steady (or static) objects 494. To start the first two 
tools, just press the corresponding buttons. To start the third tool, a user has to first 
pause the video, then click on button 494, and finally draw a free-hand curve on the 
video frame window 498. 

Fig. 5 illustrates an embodiment of a user interface in accordance with the 

10 invention for deleting dynamic markers by name. As noted above, a dynamic marker 
indicates the location of a dynamic object or the dynamic parameter pertaining to a 
static object. Each name is entered by the user who creates the marker. A list of marker 
names (500) is shown. A user can move a scroll bar (540) to view the whole list, select 
a name from the list, and click on a button 5 10 to delete it. After deleting all unwanted 

1 5 markers, the user can click on a button 520 to terminate this pop-up window. If the user 
starts this pop-up window by mistake, they can click on a button 530 to terminate this 
window. 

Fig. 6 illustrates the video browser tool. This tool displays the first frame of 
each shot in thumbnail picture 600. A user can use a scroll bar 630 to quickly glance 

20 through all thumbnail pictures and click on button 610 to quit the video browser. While 
the video is playing, the corresponding shot of the current video frame will be 
synchronously highlighted in red frame 620. 

Fig. 7 is a diagram that illustrates the tool for creating dynamic object markers. 
This tool provides a separate video player 748, a slider 728, a current frame indicator 

25 732, a video browser 768, and a cross-section viewer 752 to assist a user in finding a 
clip of interest. After selecting an IN point in the video by pressing button 736 and an 
OUT point in the video by pressing button 740, the video frame corresponding to the 
IN point will be displayed in window 700 and the OUT point frame will be displayed in 
window 704. The cross-section image 756 is generated directly from the video by (1) 

30 sampling the middle row and the middle column from each every image, (2) collecting 
all samples over time, (3) combining them into one image, and (4) finally segmenting 
the image into at least two bands according to the list of detected shots. This 
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representation provides a level of abstraction that reveals the continuity of video 
frames. A scroll bar 764 allows for a quick glance, while the current frame indicator 
760 is synchronized with the position of the video. 

Once the IN and the OUT point video frames are displayed on window 700 and 
5 704, a user can draw poly-lines , being a sequence of lines such as shown by reference 
numeral 724 in Fig. 7, to outline the boundary of the object in each window. If there is 
a mistake, the user can erase these lines by pressing button 712. After the poly-lines are 
drawn, a user can type a name in box 744 for the marker and click on button 708 to ask 
the server to extract the boundary of the same object in all frames between the IN and 

1 0 the OUT point. The same steps can be repeated until the button 716 is pressed. The user 
can also cancel operation of this tool by pressing button 720. 

Fig. 8 illustrates the system architecture for an embodiment in accordance with 
the principles of the invention.. The Collaborative Video Annotation Server 800 
receives messages from each session manager 812, serializes them, and re-sends them 

15 to every session manager. It also manages the conference, keeping a record of all 
participants as well as the state of the conference. Such state information includes video 
player state, including play/pause and current frame number, Tele-Pointer state, 
including show/hide and current position, annotations, including graphical and text 
currently on screen, video filename being loaded, and dynamic object markers (marker 

20 coordinates and associated frames. It is also responsible for bringing a new participant 
up to the current state of the conference and for recording a collaborative discussion 
into a single script file for later playback. The final overlaid videograms are 
represented by reference numeral 852 and the recorded annotation by 804. 

The session manager 812 serves as the mediator between the Server 800, with 

25 which it is coupled by way of the Internet, and the rest of client modules, 816, 820, 824, 
828, 832. It acts like an intelligent message router, transforming mouse/keyboard 
events from user interface 820 into system messages and sending them to the 
collaborative video annotation server 800, forwarding system messages from attached 
tools 832 to the collaborative video annotation server 800, and suppressing/distributing 

30 server messages to the rest of client modules. The system messages include all VCR- 
related commands, cursor positions, Tele-Pointer commands, annotation, both 
graphical and 
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text records, and annotation commands. The session manager 812 suppresses all local 

annotation record messages and all local Tele-Pointer commands. 

The Tele-Pointer 816 receives colleagues' active pointer positions as well as 

commands and draws them on the Tele-Pointer overlay plane 840. The Tele-Pointer 
5 commands include Show and Hide. The system maintains a color scheme so that the 

pointer and the annotation from the same participant will be drawn in the same color 

distinctive from each other. 

The video player 828 decodes video into uncompressed audio and video frame 

and responds to a variety of VCR commands sent by the session manager 812. Due to 
10 the need for synchronized playback among all participants, it is preferred to map all 

traditional VCR-related commands into the following new commands: 

PLAY => JUMP-PLAY CURRENT-FRAME l 0 

PAUSE => JUMP-PAUSE CURRENT-FRAME 

STEP FORWARD => JUMP-PAUSE CURRENT-FRAME+1 
1 5 STEP BACKWARD => JUMP-PAUSE CURRENT-FRAME- 1 

FAST FORWARD => JUMP-PLAY CURRENT-FRAME 2.0 

FAST REWIND => JUMP-PLAY CURRENT-FRAME -2.0 

As will be noted, the system has created only two distinctive types of VCR 

commands, namely JUMP-PLAY frame-number frame-rate and JUMP-PAUSE frame- 
20 number. To support the need of dynamic adjustment of playback speed, the system 

adds two new VCR-related functions, + and -, and maps them to JUMP-PLAY by the 

following ways: 

+ => JUMP-PLAY CURRENT-FRAME CURRENT-RATE+0. 1 
- => JUMP-PLAY CURRENT-FRAME CURRENT-RATE-0. 1 

25 

User interface 820 provides the elements of the player interface such as VCR 
controls, Tele-Pointer controls, the ability to launch attached tools 832, and monitors 
any mouse/keyboard events. Mouse/keyboard events related to annotation are sent to 
the video annotator 824, whereof the function is described below, while all 
30 mouse/keyboard events are sent to the session manager 812 for further interpretation. 
As was mentioned above, to draw any graphical annotation, the participant should hold 
down the left button, while moving the mouse. To draw a text annotation, the 
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participant should click on the right button and enter a text string on the popped-up box. 

The Audio Mixer 856 mixes the uncompressed audio signal generated by the 
video player with the audio output from the unified computer/Internet telephony 
interface, and sends it to the speaker. 
5 The graphics overlay/Image mixer 848 overlays on top of the decoded video 

frame the graphical annotation, text annotation, and Tele-Pointer icons. This overlay 
has to be done efficiently and effectively because these have to be updated at a video 
rate. The system should set up a few off-screen planes (image buffers), two for video 
frame, one for graphical annotation 836, one for text annotation 836, one for Tele- 

10 Pointers 840, one for dynamic marker for dynamic objects 844, and one for dynamic 
marker for steady objects 844. One video frame buffer is a duplicated copy of the other 
for restoring purpose. This will allow the system to offer the ability to erase graphical 
annotation separately from text annotation and dynamic markers, as well as the ability 
to provide timely annotation/Tele-Pointer update without requiring a fast video frame 

15 rate. In order to provide timely annotation/Tele-Pointer updates, the system has to 
update annotation and Tele-Pointers on a regular basis independently of whether the 
next video frame is ready. During the update of annotations and Tele-Pointers, it is also 
responsible for restoring part of the background video frame that is uncovered by the 
motion of Tele-Pointers and the erase of graphical/text annotations. This is the reason 

20 for having two video frame buffers in accordance with the principles of the invention. 

The video annotator 824 receives local client's and colleagues' active 
annotation records and commands to draw on the annotation overlay 836. Unlike local 
VCR control commands whose action is delayed, the system in accordance with the 
invention handles local client's annotation drawings immediately. This is because 

25 prompt feedback is more important. As was described earlier, in accordance with the 
principles of the invention, the drawings, text, and Tele-Pointers related to the same 
participant will be drawn in the same color distinctive from other participants. 

The attached tools 832 are components in accordance with the principles of the 
invention that can help participants to browse video content and to create dynamic 

30 markers for static as well as dynamic objects. For example, a video browser displaying 
a list of thumbnail pictures representing the video is one such tool. If a participant 
makes any selection on the video browser, this video browser will send a VCR 
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command to the session manager to allow the player to play from this particular point. 

Another example is a tool for creation of dynamic markers for dynamic objects. 
A participant will first select two frames in the video and draw the boundary of this 
object in these two video frames. The request to find the location of this object in 
5 between these frames will be sent to the server. After the server finishes the request, it 
will send a message containing locations of this object in all in-between frames to all 
participants. Each participant will record this information locally and display markers 
whenever the video plays through such frames. Finally, to create dynamic markers for 
static objects requires a participant to first pause video, click on the tool button, and 

10 draw a curve. An animation of an arrow following the curve will be shown until a 
participant plays the video again. 

In accordance with the principles of the invention, the Unified 
Computer/Internet Telephony Interface 808 provides an unified interface to computer 
or Internet telephony, which allows users to work with heterogeneous environment. 

1 5 Fig. 9 illustrates the system message flow in accordance with the principles of 

the invention, where there are three clients in the collaborative session. Each client 
maintains its own message queue (900, 908, 9 1 6), where new messages may arrive at 
slightly different time but their order is preserved across all clients. These messages 
include loading of a video or recorded annotation file, modified VCR commands, 

20 annotation records/commands, Tele-Pointer records/commands, dynamic marker 
records/commands. Each session manager (924, 936, 948) is responsible for forwarding 
various messages sent by each user interface and attached tools (904, 912, 920) to the 
collaborative dynamic video annotation server (960) and for retrieving messages from 
the message queue (900, 908, 916). Each retrieved message is then filtered through a 

25 message filter (928, 940, 952) within the session manager (924, 936, 948) before being 
distributed to different client modules for execution (932, 944, 956). During the 
playback of a recorded annotation, the collaborative dynamic video annotation server 
(960) primarily retrieves messages from the conference recording (964), but it can still 
receive messages sent by each session manager (924, 936, 948). Each message 

30 recorded in the conference recording (964) precedes a time stamp. The collaborative 
dynamic video annotation server (960) retrieves these messages according to the time 
stamp. The messages coming from both sources are mixed and sent to each session 
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manager (900, 908, 916) in the order the message is received. This ability to mix both 
recorded messages and live messages allow a user to annotate during the playback of a 
recorded annotation. 

Fig. 10 illustrates an example showing how event messages are communicated and 
5 affect the appearance of each client screen. There are two clients in the conference in this 
example. The vertical direction 1020 indicates the time line, whereas the horizontal direction 
lists the collaborative dynamic video annotation server (server) and different client-side 
system components such as the Tele-Pointer, Video Annotator, Video Player, User 
Interface, and Session Manager. Two sets of screen dumps, 1000, and 1010 are displayed 

10 on the left and the right side of this diagram respectively, in the timeline order. Different 
types of typical messages 1030, such as Join, Play, Show Tele-Pointer, Annotate, Erase, 
Pause, are illustrated in the middle section of this figure, again in the timeline order. For 
simplicity reasons, only abbreviated message names are used in this illustration. 

The present invention provides a new apparatus for permitting different people 

15 from different places to share their views about a video content, to record such a 
collaborative session, to play back synchronously such a recorded annotation, and to 
annotate/record during playback of a recorded annotation. The apparatus in accordance 
with the invention provides an environment for synchronous collaboration of any video 
content over heterogeneous networks. It is however possible to use the same apparatus 

20 to annotate a video or to playback a recorded annotation in a stand-alone scenario as 
well. 

It is also understood that the apparatus may utilize a television receiver 
apparatus for the video display function, in association with a computer and a simulated 
or actual hardware VCR or video player/recorder. 

25 As was stated above, the invention features a shared "video player ' with multi- 

point VCR control, dynamic speed adjustment, and the ability to show dynamic 
markers. Any user can play or stop the video, jump to a different location in the video, 
or dynamically change the play speed, at any given time. Different players are 
synchronized at the same video frame whenever any VCR activity occurs. Dynamic 

30 markers are automatically drawn when the player displays those frames markers are 
attached to. Second, the video player window acts as a shared whiteboard with Tele- 
Pointer support, even if there is no hardware graphic overlay support. This shared 
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whiteboard supports free-hand drawing and text. Any user can erase graphical or text 
annotation at any time. Third, the apparatus provides an easy way to attach any tools to 
the shared video player. When such tools are invoked, they will be activated on the 
side, while a collaborative session may still continue. All computation intensive tasks 
5 are done on the server side, without affecting the performance of the client. Fourth, all 
conference activities including joint voice comments can be recorded for later 
playback. These activities are time stamped to support synchronized playback. Fifth, 
any recorded conference session can be loaded and played synchronously on screens of 
all users with multi-point VCR control. The recorded joint voice comments are mixed 

10 with the audio track of the video during the playback. Sixth, any user can still annotate 
during the playback of a recorded annotation in a collaborative session with separate 
VCR controls. Finally, the new annotation together with the playback of a recorded 
annotation can again be recorded. 

The invention is intended for implementation by computerized apparatus, 

15 preferably a programmable digital computer, as is per se well known in the art, with 
appropriate software, in conjunction with appropriate peripheral equipment as 
hereinabove described, for performing steps herein disclosed for practicing the present 
invention. 

As will be understood, the invention has been described by way of non-limiting 
20 exemplary embodiments. Various modifications and additions will be apparent to 
those skilled in the art to which it pertains. For example, if hardware implementation is 
utilized to support video overlay and audio mixing, then the same functions may 
no 

longer be required in the form herein described by way of exemplary embodiments, 
25 such as Tele-Pointer, Video Annotator, audio mixer, and graphics overlay image mixer 
848. 

Such changes and modifications are contemplated to be within the spirit of the 
invention and the scope of the claims which follow. 
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CLAIMS 

What is claimed is: 

5 1 . A method for dynamic video annotation among a plurality of users at respective 
locations, utilizing programmable computer apparatus with information storage and 
retrieval capability, said method comprising the steps of: 

selecting a network service coupled to said computer; 
performing one of (a) starting and (b) joining a collaborative session among 
10 said users; 

loading one of (a) a video and (b) a recorded annotation file; 

performing annotation of at least one of graphical, text, and audio annotation; 

and 

storing said collaborative session. 

15 

2. A method for dynamic video annotation as recited in claim 1, including a step 
of playing back said collaborative session in a synchronous mode for ones of said users. 

3. A method for dynamic video annotation as recited in claim 2, wherein said 
20 synchronous mode is loosely synchronized. 

4. A method for dynamic video annotation as recited in claim 1, including a step 
of storing or recording visible activities and any voice comments occurring during said 
collaborative session. 

25 

5. A method for dynamic video annotation as recited in claim 1, wherein said 
network service comprises at least one of Internet TCP/IP, IPX Modem, and Serial 
connection. 



30 



6. A method for dynamic video annotation as recited in claim 1, wherein said step 
of performing annotation comprises utilizing at least one of Tele-Pointers; video 
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recorder/player controls; video browsers; graphical, text, and audio annotation; and 
dynamic marker creation tools. 

7. A method for dynamic video annotation as recited in claim 6, wherein said 
5 video recorder/player exhibits play and pause modes and said step of performing 

annotation can be performed independently of whether said video recorder/player is in 
either of said modes. 

8. A method for dynamic video annotation among a plurality of users, utilizing 
1 0 programmable computer apparatus with information storage and retrieval capability, 

said method comprising the steps of: 

selecting a network service coupled to said computer; 

joining a collaborative session among said users; 

loading one of (a) a video and (b) a recorded annotation file; 
1 5 performing annotation of at least one of graphical, text, and audio annotation; 

and 

storing said collaborative session. 

9. A method for dynamic video annotation in accordance with claim 8, inciuding 
20 the steps of recording and playing back video from a video recorder/player coupled to 

said computer apparatus and having a user-operable control interface; 

10. A method for dynamic video annotation in accordance with claim 8, including a 
step of individuals of said users annotating said video as to graphics; audio; 

25 TelePointer; and text. 

11. A method for dynamic video annotation in accordance with claim 10, including 
a step of individuals of said users annotating graphics on an overlay. 

30 

12. A method for dynamic video annotation in accordance with claim 10, including 
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the steps of 

individuals of said users using a Tele-Pointer; and 

automatically restoring video and graphic portions covered over by said Tele- 
Pointer. 

5 

13. A method for dynamic video annotation in accordance with claim 9, including 
the steps of 

providing a time-stamp all messages reaching said computer apparatus from 
said plurality of users; and 
1 0 sending ones of said messages having been annotated and provided with said 

time-stamp to said users during said playing back. 

14. Apparatus for dynamic video annotation among a plurality of users; said 
apparatus comprising: 

1 5 programmable computer apparatus with information storage and retrieval 

capability; 

a user interface coupled to said computer apparatus for performing selection of 
a network service, 

a user interface coupled to said computer apparatus for performing one of (a) 
20 starting and (b) joining a collaborative session among said users; 

a user interface coupled to said computer apparatus for loading one of (a) a 
video and (b) a recorded annotation file; 

a user interface coupled to said computer apparatus for performing annotation 
of at least one of graphical, text, and audio annotation; and 
25 a user interface coupled to said computer apparatus for storing said 

collaborative session. 

15. Apparatus for dynamic video annotation among a plurality of users in 
accordance with claim 14, including a shared video recorder/player coupled to said 

30 computer apparatus and having a user-operable control interface, for recording visible 
activities and any voice comments during said collaborative session and playing back 
recorded annotations. 
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16. Apparatus for dynamic video annotation among a plurality of users in 
accordance with claim 14, including apparatus for enabling said users to annotate 
graphics, audio and text. 

5 

17. Apparatus for dynamic video annotation among a plurality of users in 
accordance with claim 16, including apparatus for enabling said users to annotate 
graphics on an overlay. 

10 18. Apparatus for dynamic video annotation among a plurality of users in 

accordance with claim 17, including apparatus for supporting use of a Tele-Pointer and 
for restoring video and graphic portions covered over by said Tele-Pointer. 

19. Apparatus for dynamic video annotation among a plurality of users in 

1 5 accordance with claim 15, including apparatus for enabling said users to annotate ones 
of said videos. 

20. Apparatus for dynamic video annotation among a plurality of users in 
accordance with claim 19, including apparatus for providing a time-stamp all messages 

20 reaching said computer apparatus from said plurality of users and sending said ones of 
said messages having been annotated and provided with a time-stamp to said users 
during said playing back so as to permit individual ones of said plurality of users to 
annotate during said playing back of recorded annotations. 

25 21. Apparatus for enabling a plurality of users at respective locations to participate 
in a collaborative session regarding content of a video; to record such a collaborative 
session; to annotate/record during playback of a recorded session; and to play back 
synchronously such a recorded annotated session, said apparatus comprising: 

a shared video player/recorder function (VCR) available to each of said users, 
30 with multi-point VCR control exhibiting dynamic speed adjustment, and an ability to 
show dynamic markers; 

a function by which any of said users can play or stop said video; jump to a 
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different location in the video; dynamically change video play speed; 

said shared video player/recorder function available to each of said users being 
synchronized at the same video frame whenever any VCR activity occurs; and 

apparatus for displaying a dynamic marker when a frame to which such a 
5 marker is attached is displayed. 

22. Apparatus for enabling a plurality of users at respective locations to participate 
in a collaborative session regarding content of a video; to record such a collaborative 
session; to annotate/record during playback of a recorded session; and to play back 

1 0 synchronously such a recorded annotated session, said apparatus comprising: 

a shared video player/recorder function (VCR) exhibiting a window available to 
each of said users, with multi-point VCR control exhibiting dynamic speed adjustment, 
and an ability to show dynamic markers; 

a function by which any of said users can play or stop said video; jump to a 
1 5 different location in the video; , dynamically change video play speed; 

said shared video player/recorder function available to each of said users being 
synchronized at the same video frame whenever any VCR activity occurs; 

apparatus for displaying a dynamic marker when a frame to which such a 
marker is attached is displayed; and 
20 said shared video player/recorder function window acting as a shared 

whiteboard with Tele-Pointer support for supporting free-hand drawing and text. 

23. Apparatus for enabling a plurality of users at respective locations to participate 
in a collaborative session regarding content of a video in accordance with claim 22 

25 wherein said shared video player/recorder function window acting as a shared 
whiteboard with Tele-Pointer support for supporting free-hand drawing and text so 
functions, independently of the presence of any hardware graphic overlay support. 

24. Apparatus for enabling a plurality of users at respective locations to participate 
30 in a collaborative session regarding content of a video in accordance with claim 22 

wherein any user can erase graphical or text annotation at any time. 
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25. Apparatus for enabling a plurality of users at respective locations to participate 
in a collaborative session regarding content of a video in accordance with claim 22 
including means for attaching any of a plurality of tools to said shared video player 
function. 

5 

26. Apparatus for enabling a plurality of users at respective locations to participate 
in a collaborative session regarding content of a video in accordance with claim 22 
wherein when any of such tools are invoked, they will be activated on the side, while a 
collaborative session is permitted to continue. 

10 

27. Apparatus for enabling a plurality of users at respective locations to participate 
in a collaborative session regarding content of a video in accordance with claim 22 
wherein said collaborative session activities including joint voice comments can be 
recorded for later playback. 

15 

28. Apparatus for enabling a plurality of users at respective locations to participate 
in a collaborative session regarding content of a video in accordance with claim 27 
wherein said activities are time stamped to support synchronized playback. 

20 29. Apparatus for enabling a plurality of users at respective locations to participate 
in a collaborative session regarding content of a video in accordance with claim 28 
wherein, such a recorded session can be loaded and played synchronously on screens of 
all users with multi-point VCR control. 

25 30. Apparatus for enabling a plurality of users at respective locations to participate 
in a collaborative session regarding content of a video in accordance with claim 29 
wherein recorded joint voice comments are mixed with the audio track of a video 
during playback. 

30 31. Apparatus for enabling a plurality of users at respective locations to participate 
in a collaborative session regarding content of a video in accordance with claim 30 
wherein any user can still annotate during the playback of a recorded annotation in a 
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collaborative session with separate VCR controls to form a new annotation. 

32. Apparatus for enabling a plurality of users at respective locations to participate 
in a collaborative session regarding content of a video in accordance with claim 32 
5 wherein said new annotation together with playback of a recorded annotation can again 
be recorded. 
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